Bar charts of crime numbers against subway stations, in decreasing order

## ── Attaching packages ────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.4
## ✔ tidyr   0.8.0     ✔ stringr 1.2.0
## ✔ readr   1.1.1     ✔ forcats 0.2.0
## ── Conflicts ───────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Parsed with column specification:
## cols(
##   CMPLNT_NUM = col_integer(),
##   CMPLNT_FR_DT = col_date(format = ""),
##   CMPLNT_FR_TM = col_time(format = ""),
##   hour = col_double(),
##   hour_interval = col_character(),
##   PD_DESC = col_character(),
##   LAW_CAT_CD = col_character(),
##   BORO_NM = col_character(),
##   ADDR_PCT_CD = col_double(),
##   PREM_TYP_DESC = col_character(),
##   Latitude = col_double(),
##   Longitude = col_double(),
##   Lat_Lon = col_character(),
##   Subway_Station_ID = col_integer(),
##   Station_Name = col_character(),
##   Line_Name = col_character(),
##   Division = col_character(),
##   Station_Name_w_Line = col_character()
## )

By looking at these bar charts displaying the top 5 stations in terms of crime count, we see that the crime count by station for the 3 different crime types were closely related. For example, 125 ST (line 4, 5, 6) and 23 ST were in the top 5 for all 3 types of crime. We see that the similarity between misdemeanor and violation was strongest among the 3 possible pairings of crime type (i.e., felony-misdemeanor, felony-voilation, misdemeanor-violation). In a way, this was not surprising because misdemeanor and violation are more similar than felony which is a more serious type of crime.

Out of interest, we also looked at the 10 “safest” stations. The patterns are a little harder to infer compared with the most dangerous stations because there are many ties. Some interesting observations include: wall street station is one of the safest, whether we are looking at felony or misdemeanor. And Columbia on 116th street is also one of the safest for misdemeanor!

Box plots of crime numbers by time of the day

For overall crime count, the median was rather consistent across the time periods. The number of crimes committed was highest from 1600-2000hr, which coincided with the evening peak period. Based on that, we expected the morning peak (0800-1200hr) to display the next highest crime count, but the data did not support that. Instead, 1200-1600hr showed the second highest crime count based on median. Also, variance (as indicated by length of the box) was highest for the time periods with the highest median crime count. The patterns we saw in the overall count were similarly visible for felony and misdemeanor. For violation, the only similarity with the other crime types was that 1600-200hr was the period with the highest crime rate.

Scatter plots of Subway Human Traffic against Crime Count (by weekend, weekday; by crime type) One data point = one day of the year

## Parsed with column specification:
## cols(
##   date = col_date(format = ""),
##   date_time = col_datetime(format = ""),
##   interval = col_character(),
##   day_of_week = col_integer(),
##   day = col_character(),
##   hour = col_integer(),
##   is_holiday = col_character(),
##   station_id = col_integer(),
##   station = col_character(),
##   lines = col_character(),
##   entry_volume = col_double(),
##   exit_volume = col_double(),
##   Station_Name_w_Line = col_character()
## )

For the above and subsequent similar scatter plots, each data point on the graph represented a single day (e.g., total human traffic across all stations for that day, total crimes committed across all stations for that day).

We created the above scatter plots to investigate if there was a relationship between human traffic and crime rate at subway stations.

We plotted the charts separately by weekday and weekend to isolate any effects that weekday vs weekend might have on the relationship between human traffic count and crime count.

Overall, there was a positive correlation between human traffic and crime count for both weekday and weekend. A similar pattern was observed for felony and misdemeanor. However, the relationship was less obvious for violation for which the sample sizes were small.

Scatter plots of Subway Human Traffic against Rainfall (by weekend, weekday) One data point = one day of the year

## # A tibble: 238 x 2
##    date       traffic
##    <date>       <dbl>
##  1 2015-01-02 4203394
##  2 2015-01-05 5013683
##  3 2015-01-06 5094287
##  4 2015-01-07 5165893
##  5 2015-01-08 5118367
##  6 2015-01-09 5188385
##  7 2015-01-12 4893644
##  8 2015-01-13 5283318
##  9 2015-01-14 5302969
## 10 2015-01-15 5310834
## # ... with 228 more rows
## # A tibble: 106 x 2
##    date       traffic
##    <date>       <dbl>
##  1 2015-01-01 2145360
##  2 2015-01-03 2877233
##  3 2015-01-04 2258170
##  4 2015-01-10 2871221
##  5 2015-01-11 2240056
##  6 2015-01-17 2923245
##  7 2015-01-18 2066586
##  8 2015-01-19 3194825
##  9 2015-01-24 2729458
## 10 2015-01-25 2384640
## # ... with 96 more rows
## Parsed with column specification:
## cols(
##   date = col_date(format = ""),
##   hour_interval = col_character(),
##   total_precip = col_double()
## )
## Joining, by = "date"
## Joining, by = "date"
## # A tibble: 237 x 3
##    date       traffic total_rainfall
##    <date>       <dbl>          <dbl>
##  1 2015-01-02 4203394         0     
##  2 2015-01-05 5013683         0     
##  3 2015-01-06 5094287         0.0800
##  4 2015-01-07 5165893         0     
##  5 2015-01-08 5118367         0     
##  6 2015-01-09 5188385         0.250 
##  7 2015-01-12 4893644         0.860 
##  8 2015-01-13 5283318         0     
##  9 2015-01-14 5302969         0     
## 10 2015-01-15 5310834         0     
## # ... with 227 more rows
## # A tibble: 106 x 3
##    date       traffic total_rainfall
##    <date>       <dbl>          <dbl>
##  1 2015-01-01 2145360          0    
##  2 2015-01-03 2877233          1.54 
##  3 2015-01-04 2258170          0.550
##  4 2015-01-10 2871221          0    
##  5 2015-01-11 2240056          0    
##  6 2015-01-17 2923245          0    
##  7 2015-01-18 2066586          4.03 
##  8 2015-01-19 3194825          0    
##  9 2015-01-24 2729458          1.07 
## 10 2015-01-25 2384640          0    
## # ... with 96 more rows

The above scatter plots investigated if there was a relationship between rainfall and human traffic at subway stations.

For weekdays, we saw that human traffic was not much influenced by rainfall, which was not surprising because everyone had to go to work/school regardless of whether or not it was raining. For weekends, we saw a stronger negative relationship between rainfall and human traffic, which made sense because people might cancel their outdoor activities or leisure travelling plans depending on the weather.

We also saw that most of the data points were clustered around the y-axis, which was due to the fact that on most days there were no rain.

Scatter plots of Crime Count against Rainfall (by crime type) One data point = one day of the year

## # A tibble: 365 x 2
##    CMPLNT_FR_DT crime_count
##    <date>             <int>
##  1 2015-01-01            10
##  2 2015-01-02             9
##  3 2015-01-03             5
##  4 2015-01-04             2
##  5 2015-01-05             5
##  6 2015-01-06             8
##  7 2015-01-07            10
##  8 2015-01-08            10
##  9 2015-01-09            19
## 10 2015-01-10            20
## # ... with 355 more rows
## # A tibble: 946 x 3
## # Groups:   CMPLNT_FR_DT [?]
##    CMPLNT_FR_DT LAW_CAT_CD  crime_count
##    <date>       <chr>             <int>
##  1 2015-01-01   FELONY                5
##  2 2015-01-01   MISDEMEANOR           5
##  3 2015-01-02   FELONY                5
##  4 2015-01-02   MISDEMEANOR           4
##  5 2015-01-03   FELONY                5
##  6 2015-01-04   MISDEMEANOR           2
##  7 2015-01-05   FELONY                1
##  8 2015-01-05   MISDEMEANOR           4
##  9 2015-01-06   FELONY                4
## 10 2015-01-06   MISDEMEANOR           3
## # ... with 936 more rows
## # A tibble: 364 x 2
##    date       total_rainfall
##    <date>              <dbl>
##  1 2015-01-01         0     
##  2 2015-01-02         0     
##  3 2015-01-03         1.54  
##  4 2015-01-04         0.550 
##  5 2015-01-05         0     
##  6 2015-01-06         0.0800
##  7 2015-01-07         0     
##  8 2015-01-08         0     
##  9 2015-01-09         0.250 
## 10 2015-01-10         0     
## # ... with 354 more rows
## # A tibble: 364 x 3
##    date       total_rainfall crime_count
##    <date>              <dbl>       <int>
##  1 2015-01-01         0               10
##  2 2015-01-02         0                9
##  3 2015-01-03         1.54             5
##  4 2015-01-04         0.550            2
##  5 2015-01-05         0                5
##  6 2015-01-06         0.0800           8
##  7 2015-01-07         0               10
##  8 2015-01-08         0               10
##  9 2015-01-09         0.250           19
## 10 2015-01-10         0               20
## # ... with 354 more rows
## # A tibble: 946 x 3
## # Groups:   CMPLNT_FR_DT [?]
##    CMPLNT_FR_DT LAW_CAT_CD  crime_count
##    <date>       <chr>             <int>
##  1 2015-01-01   FELONY                5
##  2 2015-01-01   MISDEMEANOR           5
##  3 2015-01-02   FELONY                5
##  4 2015-01-02   MISDEMEANOR           4
##  5 2015-01-03   FELONY                5
##  6 2015-01-04   MISDEMEANOR           2
##  7 2015-01-05   FELONY                1
##  8 2015-01-05   MISDEMEANOR           4
##  9 2015-01-06   FELONY                4
## 10 2015-01-06   MISDEMEANOR           3
## # ... with 936 more rows

The above scatter plots investigated if there was a relationship between crime rate and rainfall at subway stations.

There did not seem to be a strong relationship between overall crime count and rainfall but we did notice that on days with heavy rain, crime count was never high, especially when we drilled down to look at misdemenor and violations. One explanation would be that heavy rainfall might have deterred potential offenders from travelling to the subway stations.

However, we noticed that this was not true for felony. High counts of felony were observed even for days with heavy rain, suggesting that felony was less dependent on the weather.

Previously, we saw that subway human traffic was sensitive to rainfall on weekends but not so much on weekdays. Therefore, we wondered if crime count would likewise be more sensitive to rainfall on weekends. Based on visual inspection, it did not seem like weekend changed the relationship between crime rate and rainfall.

Scatter plots of Crime Count against Subway Human Traffic (by crime type; by weekend, weekday) One data point = one subway station

## Joining, by = "Station_Name_w_Line"
## Joining, by = "Station_Name_w_Line"
## Joining, by = "Station_Name_w_Line"

## Joining, by = "Station_Name_w_Line"
## Joining, by = "Station_Name_w_Line"
## Joining, by = "Station_Name_w_Line"

In the previous set of scatter plots, we were using each point to represent a day aggregating across all subway stations. For this set of scatter plts, we instead aggregated across time and let each point represent a unique subway station. The focus here was to investigate if a subway station with higher traffic also suffered from higher crime rate.

We noticed a general trend of higher crime count for stations with higher human traffic, and this was true regardless of weekday or weekend, or crime type. There were two outliers with lower traffic but very high crime count (23 ST on line 6 and 125 ST on line 4, 5 & 6), which meant that for these two stations, their higher crime rate could not be well explained by human traffic alone. Other factors affecting crime rate could be whether or not that neighborhood tended to have higher crime rate. Also, lower traffic could also work in the reverse, as a station that is more isolated may attract more potential offenders, since their crimes could be more easily committed unseen.

Time Series of Crime Count across time; superimposed with Subway Human Traffic, Rainfall

## Joining, by = "date"

The above set of time series was meant to explore the trend of crime across time and also to see if it varied in the same direction as the other parameters of human traffic and rainfall.

Firstly, we could see large fluctuations in crime count across time. This would not be caused by intra-day patterns because we were already looking at total crime count per day. Therefore, we looked at the crime count across time only for weekdays and then only for weekends, but the fluctations remained. It was clear that there were other factors influencing crime count not reflected in our analysis.

Ignoring the noise, it seemed like crime rate was lower near the start and end of the year with two prominent peaks between Apr and Jun.

Then, we looked at crime count by crime type. The picture was even less clear, with the more fluctuations dominanting the graph.

In an attempt to isolate the noise, we looked at the crime count for just the top 5 stations (in terms of highest crime rate). The peaks between Apr and Jun were still there but there was now a new higher peak in Nov which was not visible previously in the overall chart. The same peak could be observed when we looked at crime rate by type.

Next, we looked at human traffic and crime count across time and there seemed to be a positive correlation. This matched our earlier observation with scatter plots.

We further tried to look at rainfall with human traffic, and rainfall with crime rate, but these two graphs were dominated by large fluctuations in rainfall and were not informative. It was clear to us that scatter plot offered a better way to visualize relationship between different variables especially if we did not believe that time was a major influencing factor for two parameters (e.g., crime and rainfall) in the same way.